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Abstract 

In a modern recommender system, it is important to understand 
how products relate to each other. For example, while a user is 
looking for mobile phones, it might make sense to recommend 
other phones, but once they buy a phone, we might instead want 
to recommend batteries, cases, or chargers. These two types 
of recommendations are referi'ed to as substitutes and comple¬ 
ments: substitutes are products that can be purchased instead 
of each other, while complements are products that can be pur¬ 
chased in addition to each other. 

Here we develop a method to infer networks of substitutable 
and complementary products. We formulate this as a supervised 
link prediction task, where we learn the semantics of substi¬ 
tutes and complements from data associated with products. The 
primary source of data we use is the text of product reviews, 
though our method also makes use of features such as ratings, 
specifications, prices, and brands. Methodologically, we build 
topic models that are trained to automatically discover topics 
from text that are successful at predicting and explaining such 
relationships. Experimentally, we evaluate our system on the 
Amazon product catalog, a large dataset consisting of 9 million 
products, 237 million links, and 144 million reviews. 

1 Introduction 

Recommender systems are ubiquitous in applications ranging 
from e-commerce to social media, video, and online news plat¬ 
forms. Such systems help users to navigate a huge selection 
of items with unprecedented opportunities to meet a variety of 
special needs and user tastes. Making sense of a large num¬ 
ber of products and driving users to new and previously un¬ 
known items is key to enhancing user experience and satisfac¬ 
tion miHiia. 

While most recommender systems focus on analyzing pat¬ 
terns of interest in products to provide personalized recom¬ 
mendations d |30l [341 |36l, another important problem is to 
understand relationships between products, in order to surface 
recommendations that are relevant to a given context iiniiia. 

*jmcauley@ucsd.edu 

trahul@pinterest.com 

tjure@cs.stanford.edu 


For example, when a user in an online store is examining t- 
shirts she should receive recommendations for similar t-shirts, 
or otherwise jeans, sweatshirts, and socks, rather than (say) a 
movie even though she may very well be interested in it. From 
these relationships we can construct a product graph, where 
nodes represent products, and edges represent various types of 
product relationships. Such product graphs facilitate many im¬ 
portant applications: Navigation between related products, dis¬ 
covery of new and previously unknown products, identification 
of interesting product combinations, and generation of better 
and more context-relevant recommendations. 

Despite the importance of understanding relationships be¬ 
tween products there are several interesting questions that make 
the problem of building product graphs challenging: What are 
the common types of relationships we might want to discover? 
What data will allow us to reliably discover relationships be¬ 
tween products? How do we model the semantics of why cer¬ 
tain products are related?—For example, the semantics of why 
a given t-shirt might be related to a particular pair of jeans are 
intricate and can only be captured by a highly flexible model. 
And Anally, how do we scale-up our methods to handle graphs 
of millions of products and hundreds of millions of relations? 

Inferring networks of product relationships. Here we are 
interested in inferring networks of relationships between mil¬ 
lions of products. Even though our method can be used to learn 
any type of relationship, we focus on identifying two types of 
links between products: substitutes and complements ^.Sub¬ 
stitutable products are those that are interchangeable—such as 
one t-shirt for another, while complementary products are those 
that might be purchased together, such as a t-shirt and jeans. 

We design a system titled Sceptre (Substitute and Comple¬ 
mentary Edges between Products from Topics in Reviews), that 
is capable of modeling and predicting relationships between 
products from the text of their reviews and descriptions. At its 
core. Sceptre combines topic modeling and supervised link pre¬ 
diction, by identifying topics in text that are useful as features 
for predicting links between products. Our model also handles 
additional features such as brand, price, and rating information, 
product category information, and allows us to predict multi¬ 
ple types of relations {e.g. substitutes and complements) simul¬ 
taneously. Moreover, Sceptre harnesses the fact that products 
are arranged in a category hierarchy and allows us to extend 
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Figure 1: Sceptre learns the concept of substitute and comple¬ 
ment goods from product information (descriptions, reviews, 
etc.). Given a query item. Sceptre allows us to generate sub¬ 
stitute and complementary recommendations as shown above. 


this hierarchy to discover ‘micro-categories’—fine-grained cat¬ 
egories of closely related products. 

An example of the output of Sceptre is shown in Figure [T] 
Here, given a query item (a hiking boot), our system identifies 
a ranked list of potential substitutes (other hiking boots), and 
complements (heavy-duty socks, shoe polish, etc.). 

We train Sceptre on a large corpus of 9 million products from 
Amazon, with 237 million connections derived from browsing 
and co-purchasing data. We evaluate Sceptre in terms of its ac¬ 
curacy at link prediction and ranking, where we find it to be 
significantly more accurate than alternatives. We also use Scep¬ 
tre to build a product graph, where for every product we recom¬ 
mend a list of the most related complementary and substitutable 
products. Finally, we show that Sceptre can be applied in ‘cold- 
start’ settings, by using other sources of text when reviews are 
unavailable. Overall, we find that the most useful source of in¬ 
formation to identify substitutes and complements is the text 
associated with each product (i.e., reviews, descriptions, and 
specifications), from which we are able to uncover the key fea¬ 
tures and relationships between products, and also to explain 
these relationships through textual signals. 

We envision several applications of the product graphs pro¬ 
duced by our system. Our system can help users to navigate, 
explore and discover new and previously unknown products. 
Or, it can be used to identify interesting product combinations, 
e.g. we can recommend outfits by matching a shirt with comple¬ 
mentary trousers and a jacket. And, our system can be used as a 


candidate-generation step in providing better and more context¬ 
relevant recommendations. 


2 Related work 

The basic task of a recommender system is to suggest relevant 
items to users, based on their opinions, context, and behavior. 
One component of this task is that of estimating users’ ratings 
or rankings of products M, e.g. by matrix factorization oa 
or collaborative filtering El. Our goal here is related but com¬ 
plementary to rating estimation as we aim to discover relations 
between products. 

In principle the types of relationships in which we are in¬ 
terested can be mined from behavioral data, such as browsing 
and co-purchasing logs. For example, Amazon allows users to 
navigate between products through links such as ‘users who 
bought X also bought Y’ and ‘users who viewed X also viewed 
Y’ E). Such a ‘co-counting’ solution, while simple, has a 
few shortcomings, for example it may produce noisy recom¬ 
mendations for infrequently-purchased products, and has lim¬ 
ited ability to explain the recommendations it provides. More 
sophisticated solutions have been proposed that make use of 
browsing and co-purchasing data {e.g. El), but in contrast to 
such ‘behavioral-based’ solutions our goal is to learn the se¬ 
mantics of ‘what makes products related?’ in order to generate 
new links, adapt to different notions of relatedness, and to un¬ 
derstand and explain the features that cause humans to consider 
products to be related. 

Topic models are a fundamental building block of text mod¬ 
eling El ilia and form the cornerstone of our model. A variety 
of works have used topic models within recommender systems 
{e.g. ||a[l0l[II]l22l|23ll28l[3T][^), though generally with the 
goal of predicting user ratings (or opinions) rather than learn¬ 
ing relationships between products as we do here. More specif¬ 
ically, our work builds on topic models for networks; Block- 
LDA 111, topic-link LDA ifTSl . and relational topic models El 
all attempt to identify topics that explain the links in document 
networks. A promising line of work uses such ideas to model 
social and citation networks ||8] EH EH • However, these meth¬ 
ods have trouble scaling to large networks, while Sceptre scales 
to corpora with millions of documents (products) and hundreds 
of millions of links. 

Last, a related branch of work aims to enhance e-commerce 
using browsing data. For example, El aims to forecast com¬ 
mercial intent based on query logs; and in 1261 the authors use 
query data to identify attributes that are important to users in 
order to surface recommendations. While different in terms of 
the data and problem setting, such works are similar in that they 
uncover relationships from large sources of weakly-structured 
data. 


3 The Sceptre Model 

In the following we build Sceptre gradually, but in such a way 
that at each step we are specifying a usable model. We highlight 
the differences between successive iterations of our model in 
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blue. We do this to emphasize the fact that while Sceptre makes 
use of several interacting components, each of these compo¬ 
nents brings an additional modeling aspect into the framework. 
Table [T] describes the notation we use throughout the paper. 

3.1 High-level Overview 

We first present a high-level, conceptual view of Sceptre, to 
explain the intuition behind the model before we fully specify 
it. 

Topic Models. We use topic models |4l to discover topics from 
product reviews and other sources of text. Conceptually, this 
means that the text from millions of products can be clustered 
into a small number of dimensions, so that each product i (and 
its text) can be represented by a topic vector Oi encoding the 
extent to which reviews/descriptions of a given product discuss 
each of the topics. 

Link Prediction. Topic models allow us to represent each 
product i by a vector 6i. On top of this we can have a statis¬ 
tical model to predict properties about products. In our case, 
we use logistic regression to make predictions about pairs of 
items, using features that combine the topics of two products 9i 
and 9j simultaneously. The classification task we are interested 
in is: does a relationship exist between i and j ? Using pairwise 
features of the products, e.g. j) = 9j — 9i, we build logistic 
classifiers such that (/3, takes a positive value if i and j 

are connected by an edge. We further develop this model so that 
predicting the presence of an edge and the direction of an edge 
are treated as two separate tasks, to account for asymmetries 
and to help with interpretability. 

Importantly, it should be noted that we do not train topic 
models and then perform link prediction, but rather we define a 
joint objective such that we discover topics that are informative 
for our link prediction task. In this way our model uncovers 
topics that are good at ‘explaining’ the relationships between 
products. 

Micro-Categories. An additional goal of Sceptre is to be able 
to discover micro-categories of closely related products. We 
achieve this by using sparse representations of very high di¬ 
mensional topic vectors for each product. We make use of ex¬ 
plicit product hierarchies (such as the category tree available 
from Amazon), where each node of the hierarchy has a small 
number of topics associated with it. The hierarchical nature of 
the category tree means that topics associated with top-level 
nodes are general and broad, while topics associated with leaf 
categories focus on differentiating between subtle product fea¬ 
tures, which can be interpreted as micro-categories {e.g. differ¬ 
ent styles of running shoes). 

Product graph. Finally, we have a supervised learning frame¬ 
work to predict relationships between products. Discovering 
substitutes and complements then depends on the choices of 
graph we use to train the model, for which we collect several 
graphs of related products from Amazon. For example, a co¬ 
purchasing graph such as ‘users frequently purchased a and 
b together’ encodes some notion of complements, whereas a 
graph such as ‘users who viewed a eventually purchased V 


Symbol 

Description 

di 

document associated with an item (product) i 

T 

document corpus 

K 

number of topics 

e^ 

AT-dimensional topic distribution for item i 

Pk 

word distribution for topic k 

Wd,j 

j* word of document d 

^d,j 

topic of the j* word document d 

Nd 

number of words in document d 

F{x) 

logistic (sigmoid) function, 1/(1 -|- e“®) 


observed edges in graph g 

j) 

pairwise (undirected) features for items i and j 

'f(lj) 

pairwise (directed) features for items i and j 

(3 

logistic weights associated with 'p{i,j) 

V 

logistic weights associated with (p{i,j) 


Table 1: Notation. 


captures the notion of substitutes. Thus, for every product, we 
predict a list of complementary and substitutable products and 
collect them into a giant network of related products. 


3.2 Detailed Technical Description 

3.2.1 Background: Latent Dirichlet Allocation 

Latent Dirichlet Allocation (LDA, 0) uncovers latent structure 
in document corpora. For the moment, ‘documents’ shall be the 
set of reviews associated with a particular product. LDA asso¬ 
ciates each document in a corpus d G T with a AT-dimensional 
topic distribution 9d (a stochastic vector, i.e., J2k^d,k = 1), 
which encodes the fraction of words in d that discuss each of 
the K topics. That is, words in the document d discuss topic k 
with probability 0^ fe. 

Each topic k also has an associated word distribution, (pk, 
which encodes the probability that a particular word is used for 
that topic. Finally, the topic distributions themselves (0^) are 
assumed to be drawn from a Dirichlet prior. 

The final model includes word distributions for each topic 
(pk, topic distributions for each document 0^, and topic assign¬ 
ments for each word Zdj. Parameters $ = {9,p} and topic 
assignments z are traditionally updated via sampling 0. The 
likelihood of a particular text corpus T (given the word distri¬ 
bution p, topics 0, and topic assignments for each word z) is 
then 

Ni 

p{T\9 ,^)=n n 

deTj=i 

where we are multiplying over all documents in the corpus, 
and all words in each document. The two terms in the product 
are the likelihood of seeing these particular topics (0^^ ^ ), and 
the likelihood of seeing these particular words for this topic 
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3.2.2 Link Prediction with Topic Models 

‘Supervised Topic Models’ 0 allow topics to be discovered 
that are predictive of an output variable associated with each 
document. We propose a variant of a supervised topic model 
that identifies topics that are useful as features for link predic¬ 
tion. We choose an approach based on logistic regression be¬ 
cause (1) It can be scaled to millions of documents/products 
by hundreds of millions of edges, and (2) It can be adapted to 
incorporate both latent features (topics) and manifest features 
(such as brand, price, and rating information), as well as arbi¬ 
trary transformations and combinations of these features. Our 
goal here is to predict links, that is labels at the level of pairs of 
products. In particular, we want to train logistic classifiers that 
for each pair of products {i,j) predicts whether they are related 
(Uij = 1) or not {yij = 0). For now we will consider the case 
where we are predicting just a single type of relationship and 
we will later generalize the model to predict multiple types of 
relationships (substitutes and complements) simultaneously. 

We want the topics associated with each product to be ‘use¬ 
ful’ for logistic regression in the sense that we are able to learn 
a logistic regressor parametrized by /3 that predicts yi j, using 
the topics 6i and 9j as features. That is, we want the logistic 
function 

F/siOd) = a{{l3,ipe{i,j))) ( 2 ) 

to match y^ j as closely as possible, where j) is a pair¬ 
wise feature vector describing the two products. We then aim 
to design features that encode the similarity between the two 
products (documents). The specific choice we adopt is 

4’eihj) = ( 1 ; ^i.l ■ ’ &j,2, ■ ■ ■ , &i,K ’ ( 3 ) 

Intuitively, by defining our features to be the elementwise prod¬ 
uct between 0i and 9j, we are saying that products with similar 
topic vectors are likely to be linked. The logistic vector (3 then 
determines which topic memberships should should be similar 
(or dissimilar) in order for the products to be related. 

Our goal then is to simultaneously optimize both topic dis¬ 
tributions 9d and logistic parameters /3 to maximize the joint 
likelihood of topic memberships and relationships in the prod¬ 
uct graph: 

corpus likelihood 
/-s 

Ni 

L{y,T\l3,9,(j),z) = 

d&rj=i 

n n (i-^/3(V'e(*:j)))- (4) 

'-V-" 

logistic likelihood of the observed graph 

This expression says that the review corpus should have high 
likelihood according to a topic model, but also that those top¬ 
ics should be useful as predictors in a logistic regressor that 
uses their similarity as features. In this way, we will intuitively 
discover topics that correspond to some concept of document 
‘relatedness’. 


This idea of jointly training topic and regression models is 
closely related to the model of Il2^ . where topics were dis¬ 
covered that are useful as parameters in a latent-factor recom- 
mender system. Roughly, in the model of ll22l . a user would 
give a high rating to a product if their latent user parameters 
were similar to the topics discovered from reviews of that item; 
topics were then identified that were good at predicting users’ 
ratings of items. The basic model of (eq. is similar in the 
sense that we are coupling parameters 9 and (3 in a joint likeli¬ 
hood in order to predict the output variable y. 

Directed vs. Undirected Graphs. So far we have shown how 
to train topic models to predict links between products. How¬ 
ever, the feature vector of (eq. is symmetric = 

i’eijj'i)), meaning that it is only useful for predicting undi¬ 
rected relationships. However, none of the relationships we 
want to predict are necessarily symmetric. For example y may 
be a good substitute for a; if y is a similar product that is cheaper 
and better rated, but in this case x would be a poor substitute for 
y. Or, while a replacement battery may be a good complement 
for a laptop, recommending a laptop to a user already purchas¬ 
ing a battery makes little sense. Thus we ought to account for 
such asymmetries in our model. 

We model such asymmetries by first predicting whether two 
products are related, and then predicting in which direction the 
relation flows. That is, we predict 

p{a has an edge toward h) = 

p{a is related to h) xp(edge flows from a to 6 | a is related to b), 
which we denote 

'does the edge flow in this direction?’ 

/-^ ■\ 

p((a, b) G S) = p{a GG b) p(a -G b\a gg b), (5) 

‘are they related?’ 

where relations (a, b) G £ are now ordered pairs (that may ex¬ 
ist in both directions). We model relations in this way since we 
expect different types of language or features to be useful for 
the two tasks—^relatedness is a function of what two products 
have in common, whereas the direction the link flows is a func¬ 
tion of how the products dijfer. Indeed, in practice we find that 
the second predictor p{a -G h\a gg h) tends to pick up qual¬ 
itative language that explains why one product is ‘better than’ 
another, while the first tends to focus on high-level category 
specific topics. Our objective now becomes 

L{y,T\l3,r],9,(j),z) = 

positive relations and their direction of flow {F^) 

{ij)e£ 

Ni 

n n n ■ (6) 

(i,i)G£ deT i=i 

'-V--V-^ 

non-relations corpus likelihood 

Here is the same as in the previous section, though we 
have added {‘fieihj)) to predict edge directedness; this in¬ 
cludes an additional logistic parameter vector 77 , as well as an 
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additional feature vector ipg{i,j). The specific feature vector 
we use is 


^eihj) = - Oi,k), (7) 

i.e. the direction in which an edge flows between two items is a 
function of the difference between their topic vectors. 

Incorporating Other Types of Features. We can easily incor¬ 
porate manifest features into our logistic regressors, which sim¬ 
ply become additional dimensions in (pe{i,j). We include the 
difference in price, difference in average (star-) rating, and an 
indicator that takes the value 1 if the items were manufactured 
by different companies, allowing the model to capture the fact 
that users may navigate towards (or away from) cheaper prod¬ 
ucts, better rated products, or products from a certain brand. 

Our entire model ultimately captures the following simple 
intuition: (1) Users navigate between related products, which 
should have similar topics (“what do a and b have in com¬ 
mon?”), and (2) The direction in which users navigate should 
be related to the difference between topics (“what does b have 
that a doesn’t?”). Ultimately, all of the above machinery has 
been designed to discover topics and predictors that capture this 
intuition. 

Learning Multiple Graphs. Next we must generalize our ap¬ 
proach to simultaneously learn multiple types of relationships. 
In our case we wish to discover a graph of products that users 
might purchase instead (substitute products), as well as a graph 
of products users might purchase in addition (complementary 
products). Naturally, one could train models independently for 
each type of relationship. But then one would have two sets of 
topics, and two predictors that could be used to predict links in 
each graph. 

Instead we decide to extend the model from the previous sec¬ 
tion so that it can predict multiple types of relations simultane¬ 
ously. We do this by discovering a single set of topics that work 
well with multiple logistic predictors. This is a small change 
from the previous model of (eq.|^: 


corpus likelihood 

/■ 

Nd 

deTj=i 

n{ n 

geG ^{id)e£, 

"V 

accuracy of the predictors jSg and rig for the graph g 

( 8 ) 

where each graph g G G contains all relations of a particular 
type. 

Note that we learn separate predictors /3g and rjg for each 
graph g, but we learn a single set of topics (6) and features (tjj 
and (fi) that work well for all graphs simultaneously. We adopt 
this approach because it provides a larger training set that is 
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Figure 2: Part of the product hierarchy for Amazon Electronics 
products (the complete tree, even for Electronics alone, is too 
large to include here). 


more robust to overfitting compared to training two models sep¬ 
arately. Moreover it means that both logistic regressors operate 
in the same feature space; this means that by carefully con¬ 
structing our labeled training set (to be described in the follow¬ 
ing section), we can train the model not only to predict substi¬ 
tute and complementary relationships, but also to differentiate 
between the two, by training it to identify substitutes as non¬ 
complements and vice versa. 

3.2.3 Sparse Topics via Product Hierarchies 

Our goal is to learn topic models on corpora with millions 
of products and hundreds of topics. However, training mod¬ 
els with hundreds of topics per product is not practical, nor is 
it realistic from a modeling perspective. Rather, each product 
should draw from a small number of topics, which can be en¬ 
coded using a sparse representation ca. To achieve this we 
enforce sparsity through a specific type of hierarchical topic 
representation that makes use of an explicit category tree, such 
as the one available from Amazon. 

An example of the product hierarchy we obtain from Amazon 
is shown in Eigure|^ The category of each product is a node in 
this tree (though not necessarily a leaf node); some products 
may also belong to multiple categories simultaneously. 

We build our topic representation using the following 
scheme: Eirst, each product is represented by a path, or more 
simply a set of nodes, in the category tree. Eor products belong¬ 
ing to multiple categories, we take the union of those paths. 
Second, each topic is associated with a particular node in the 
category tree. Every time we observe, say, a thousand instances 
of a node, we associate an additional topic with that node, up 
to some maximum. In this way we have many topics associated 
with popular categories (like ‘Electronics’) and fewer topics as¬ 
sociated with more obscure categories. Third, we use a sparse 
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Figure 3: A demonstration of our topic hierarchy. A product (left) is shown with its associated topics (right), (a) the category tree 
(b) the topic vector (c) the product’s ground-truth category. The product’s position in the category tree is highlighted in red, and 
the set of topics that are ‘activated’ is highlighted in gray. 


representation for each product’s topic vector. Specihcally, if a 
product occupies a particular path in the category tree, then it 
can only draw words from topics associated with nodes in that 
path. In this way, even though our model may have hundreds 
of topics, only around 10-20 of these will be ‘active’ for any 
particular product. This is not only necessary from a scalabil¬ 
ity standpoint, but it also helps the model quickly converge to 
meaningful topic representations. 

This process is depicted in Figure Here a product like 
a laptop charger draws from ‘generic’ topics that apply to all 
electronics products, as well as topics that are specihc to lap¬ 
tops or chargers; but it cannot draw from topics that are spe¬ 
cihc to mobile phones or laptop cases (for example). Thus all 
products have some high-level categories in common, but are 
also assumed to use their own unique sub-category specihc lan¬ 
guage. Then, at the lowest level, each leaf node in the category 
tree is associated with multiple topics; thus we might learn sev¬ 
eral ‘microcategories’ of laptop chargers, e.g. for different lap¬ 
top types, price points, or brands. We present some examples 


of the types of microcategory we discover in Section 4.6 


3.3 Optimization and Training 

Optimizing an objective such as the one in (eq.J^ is a diffi¬ 
cult task, for instance it is certainly non-convexPJWe solve it 
using the following EM-like procedure, in which we alternate 
between optimizing the model parameters 0 = (/3, rj, 9, (p) and 
topic assignments (latent variables) z: 

update ^ a.Tgmayil{y,T\l3,ri,e,(p,z^* (9) 

e 

sample with probability p{z^^l = k) = 9d,kPk!wa,j ’ 

where l{y, T|/3, rj, 6, p, z) is the log-likelihood from (eq.|^. To 
generate initial values for 0 and z we initialize continuous pa¬ 
rameters and topics uniformly at random (continuous parame¬ 
ters are sampled from [0,1)). 

In the hrst step (eq. [^, topic assignments for each word (z) 
are hxed. We ht the remaining terms, /?, ry, 9, and p, by gradi¬ 
ent ascent. We use the Hybrid LBFGS solver of GSl, a quasi- 
Newton method for non-linear optimization of problems with 
many variables ll25l . Computing the partial derivatives them¬ 
selves, while computationally expensive, is naively paralleliz- 
able over edges in S and documents {i.e., products) in T. 

The second step iterates through all products d and all word 
positions j and updates topic assignments. As with LDA, we 

*It is smooth, and multiple local minima can be found by permuting the 
order of topics and logistic parameters. 


assign each word to a topic (an integer between 1 and K) ran¬ 
domly, with probability proportional to the likelihood of that 
topic occurring with that word. The expression 9d^k4'k,wd , is 
the probability of the topic k for the product d {9d,k), multi¬ 
plied by the probability of the word at position j (wdj) being 
used in topic k (<pk,wjj)- 

4 Experiments 

Next we evaluate Sceptre. We hrst describe the data as well as 
the baselines and then proceed with experimental results. 

4.1 Data 

We use data crawled from Amazon.com, whose characteristics 
are shown in Table This data was collected by performing 
a breadth-hrst search on the user-product-review graph until 
termination, meaning that it is a fairly comprehensive collec¬ 
tion of English-language product data. We split the full dataset 
into top-level categories, e.g. Books, Movies, Music. We do this 
mainly for practical reasons, as it allows each model and dataset 
to ht in memory on a single machine (requiring around 64GB 
RAM and 2-3 days to run our largest experiment). Note that 
splitting the data in this way has little impact on performance, 
as there are few links that cross top-level categories, and the 
hierarchical nature of our model means that few parameters are 
shared across categories. 

To obtain ground-truth for pairs of substitutable and com¬ 
plementary products we also crawl graphs of four types from 
Amazon: 

1. ‘Users who viewed x also viewed y'\ 91M edges. 

2. ‘Users who viewed x eventually bought y’; 8.18M edges. 

3. ‘Users who bought x also bought ?/’; 133M edges. 

4. ‘Users frequently bought x and y together’; 4.6M edges. 
We refer to edges of type 1 and 2 as substitutes and edges of 
type 3 or 4 as complements, though we focus on ‘also viewed’ 
and ‘also bought’ links in our experiments, since these form the 
vast majority of the dataset. Note the minor differences between 
certain edge types, e.g. edges of type 4 indicate that two items 
were purchased as part of a single basket, rather than across 
sessions. 

4.2 Experimental Setting 

We split our training data (£ and £) into 80% training, 10% 
validation, 10% test, discarding products with fewer than 20 
reviews. In all cases we report the error on the test set. The 
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Category 

Users 

Items 

Reviews 

Edges 

Men’s Clothing 

L25M 

371K 

8.20M 

8.22M 

Women’s Clothing 

L82M 

838K 

14.5M 

17.5M 

Music 

L13M 

557K 

6.40M 

7.98M 

Movies 

2.11M 

208K 

6.17M 

4.96M 

Electronics 

4.25M 

498K 

1L4M 

7.87M 

Books 

8.20M 

2.37M 

25.9M 

50.0M 

All 

2L0M 

9.35M 

144M 

237M 


Table 2: Dataset statistics for a selection of categories on Ama¬ 
zon. 


iterative fitting process described in (eqs.j^and 101 continues 
until no further improvement is gained on the validation set. 


Sampling Non-Edges. Since it is impractical to train on all 
pairs of non-links, we start by building a balanced dataset by 
sampling as many non-links as there are links (i.e., \S\ = |£|). 

However, we must be careful about how non-links {i.e., neg¬ 
ative examples) are sampled. Sampling random pairs of unre¬ 
lated products makes negative examples very ‘easy’ to classify; 
rather, since we want the model to be able to differentiate be¬ 
tween edge types, we treat substitute links as negative exam¬ 
ples of complementary edges and vice versa. Thus, we explic¬ 
itly train the model to identify substitutes as non-complements 
and vice versa (in addition to a random sample of non-edges). 
This does not make prediction ‘easier’, but it helps the model 
to learn a better separation between the two edge types, by ex¬ 
plicitly training it to learn distinct notions of the two concepts. 

In the following, we consider both link prediction and rank¬ 
ing tasks: (1) to estimate for a pair of products whether they are 
related, and (2) for a given query, rank those items that are most 
likely to be related. We first describe the baselines we compare 
against. 


4.3 Baselines 

Random. Link probabilities Fjs and Frj are replaced with ran¬ 
dom numbers between 0 and 1. Note that since both predictors 
have to ‘fire’ to predict a relation, random classification identi¬ 
fies 75% of directed edges as non-links; imbalance in the num¬ 
ber of positive vs. negative relations of each type (due to our 
sampling procedure described above) explains why the perfor¬ 
mance of random classification is slightly different across ex¬ 
periments. 

EDA + logistic regression (LDA). Rather than training topic 
models and logistic parameters simultaneously, this baseline 
first trains a topic model and then trains logistic classifiers on 
the pre-trained topics. This baseline assesses our claim that 
Sceptre learns topics that are ‘good as’ features for edge predic¬ 
tion, by comparing it to a model whose topics were not trained 
specifically for this purpose. We used Vowpal Wabbit to pre¬ 
train the topic model, and fit models with K = 100 topics for 
each Amazon category. 

We also experimented with a baseline in which features were 


defined over words rather than topics. That is, topics 9i for each 
product are replaced by TF-IDF scores for words in its reviews 
EOll . Logistic parameters and rj are then trained to determine 
which tf-idf-weighted words are good at predicting the pres¬ 
ence or absence of edges. This baseline was uniformly weaker 
than our other baselines, so we shall not discuss its performance 
further. 

Category-Tree (CT). Since we make use of Amazon’s category 
tree when building our model, it is worth questioning the extent 
to which the performance of our model owes to our decision to 
use this source of data. For this baseline, we compute the co¬ 
counts between categories ci —C 2 that are observed in our 
training data. Then we predict that an edge exists if it is among 
the 50* percentile of most commonly co-occurring categories. 
In other words this baseline ‘lifts’ links to the level of categories 
rather than individual products]^ 

Item-to-Item Collaborative Filtering (CF). In 2003 Ama¬ 
zon reported that their own recommendation solution was a 
collaborative-filtering approach, that identified items that had 
been browsed or purchased by similar sets of users nil. This 
baseline follows the same procedure, though in lieu of actual 
browsing or purchasing data we consider sets of users who have 
reviewed each item. We then proceed by computing for each 
pair of products a and b the cosine similarity between the set 
of users who reviewed a and the set of users who reviewed h. 
Sorting by similarity generates a ranked list of recommenda¬ 
tions for each product. Since this method is not probabilistic 
we only report its performance at ranking tasks. 


4.4 Link Prediction and Ranking 

Link Prediction. Our first goal is to predict for a given pair of 
products (a, b), and a graph type g, whether there is a link from 
a to 6 in Sg. We optimize exactly the objective in (eq. 1 ^. Note 
that a prediction is correct when 

• for each test edge (in each graph): a b, 

F^{'fi{a,b),l3) > Oand F^{ip{a,b),r]) > 0 

• for each non-edge a -fit b, 

F^{fi{a,b),l3) < Oor F^{(p{a,b),r)) < 0, 

in other words the model must correctly predict both that the 
link exists and its direction. 

Results are shown in Table |3]for each of the datasets in Ta¬ 
ble 1^ We also show results from ‘Baby’ clothes, to demon¬ 
strate that performance does not degrade significantly on a (rel¬ 
atively) smaller dataset (43k products). ‘Substitute’ links were 
unavailable for the vast majority of products from Music and 
Movies in our crawl, so results are not shown. We summarize 
the main findings from this table as follows: 

1. Sceptre is able to accurately predict both substitute and 
complement links across all product categories, with per¬ 
formance being especially accurate for clothing and elec¬ 
tronics products. Accuracy is between 85.57-96.76% for 
the binary prediction tasks we considered. 

^We experimented with several variations on this theme, and this approach 
yielded the best performance. 
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2. Prediction of ‘substitute’ links is uniformly more accurate 
than ‘complement’ links for all methods, both in absolute 
(left two columns) and relative (right two columns) terms. 
This matches our intuition that substitute links should be 
‘easier’ to predict, as they essentially correspond to some 
notion of similarity, whereas the semantics of comple¬ 
ments are more subtle. 

3. The performance of the baselines is variable. For substi¬ 
tute links, our LDA baseline obtains reasonable perfor¬ 
mance on Books and Electronics, whereas the Category 
Tree (CT) baseline is better for Clothing. In fact, the CT 
baseline performs surprisingly well at predicting substi¬ 
tute links, for the simple reason that substitutable products 
often belong to the same category as each other. 

4. None of the baselines yield satisfactory performance when 
predicting complement links. Thus we conclude that nei¬ 
ther the topics uncovered by a regular topic model, nor 
the category tree alone are capable of capturing the subtle 
notions of what makes items complementary. 

Ultimately we conclude that each of the parts of Sceptre con¬ 
tribute to its accurate performance. Category information is 
helpful, but alone is not useful to predict complements; and si¬ 
multaneous training of topic models and link prediction is nec¬ 
essary to learn useful topic representations. 

Ranking. In many applications distinguishing links from non¬ 
links is not enough as for each product we would like to recom¬ 
mend a limited number of substitutes and complements. Thus, 
it is important that relevant items (i.e., relevant relations) are 
ranked higher than irrelevant ones, regardless of the likelihood 
that the model assigns to each recommendation. 

A standard measure to evaluate ranking methods is the preci- 
sion@k. Given a set of recommended relations of a given type 
rec, and a set of known-relevant products rel (i.e., ground-truth 
links) the precision is dehned as 

precision = \rel Cl rec|/|rec|, (11) 

i.e., the fraction of recommended relations that were relevant. 
The precision @k is then the precision obtained given a hxed 
budget, i.e., when |rec| = k. This is relevant when only a small 
number of recommendations can be surfaced to the user, where 
it is important that relevant products appear amongst the hrst 
few suggestions. 

Figure reports the precision@k on Men’s and Women’s 
clothing. Note that we naturally discard candidate links that ap¬ 
peared during training. This leaves only a small number of rel¬ 
evant products for each query item in the corpus—the random 
baseline (which up to noise should be flat) has precision around 
5 X 10“®, indicating that only around 5 in 100,000 products are 
labeled as ‘relevant’ in this experiment. This, in addition to the 
fact that many relevant items may not be labeled as such (there 
are presumably thousands of pairs of substitutable pants in our 
corpus, but only 30 or so are recommended for each product) 
highlights the incredible difficulty of obtaining high precision 
scores for this task. 

Overall, collaborative filtering is one to two orders-of- 
magnitude more accurate than random rankings, while Scep¬ 
tre is an order of magnitude more accurate again (our LDA and 



„Women's (Complements) 



0 200 400 600 800 1000 

k 



Men’s (Complements) 



0 200 400 600 800 1000 
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Figure 4: Precision@k for Women’s and Men’s clothing. 


TF-IDF baselines were less accurate than collaborative filtering 
and are not shown) 0 

Examples of recommendations generated by Sceptre are 
shown in Figure]^ 


4.5 Cold-Start Prediction without Reviews 

Although it is promising that we are able to recommend sub¬ 
stitutes and complements from the text of product reviews, 
an obvious question remains as to what can be done for new 
products, that do not yet have any reviews associated with 
them, known as the ‘cold-start’ problem in recommender sys¬ 
tems |[l6l|23|22l|30l|36l. To address this problem, we note that 
Sceptre merely requires that we have some source of text asso¬ 
ciated with each linked item in order to learn a model of which 
products are likely to be related. 

To evaluate the possibility of using sources of text other 
than reviews, we collected descriptive text about each item in 
our Amazon Books catalog, including blurbs, ‘about the au¬ 
thor’ snippets, and editorial reviews. We also collected man¬ 
ufacturer’s descriptions for a subset of our Electronics data. 
Training on these sources of data. Sceptre obtained accuracies 
between 91.28% and 93.67% at predicting substitutes and com¬ 
plements (see Table|^. This result implies that training is possi¬ 
ble on diverse sources of text beyond just product reviews, and 
that Sceptre can be ajmlied in cold-start settings, even when no 
reviews are available n 


^Note that collaborative filtering is done here at the level of reviewed prod¬ 
ucts, which is naturally much sparser than the purchase and browsing data used 
to produce the ground-truth. 

‘^Note that this is not directly comparable to review-based results since dif¬ 
ferent subsets of our corpus have reviews vs. descriptions. 
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Electronics 


elll 

e92 

e75 

e79 

e78 

eSO 

e69 

e85 

e96 

e89 

e99 

cameras portable speakers 

cases 

Samsung 

cases 

heavy-duty 

cases 

styli 

batteries 

portable radios car radios 

high-end 

head¬ 

phones 

budget 

head¬ 

phones 

camera 

little speaker 

leather 

Galaxy 

Otterbox 

pen 

batteries 

radio 

radio 

bass 

bass 

zoom 

bose 

case 

elastic 

Defender 

tip 

battery 

weather 

Pioneer 

Sennheiser 

Skullcandy 

pictures portable speaker 

soft 

magnets 

protection 

Bamboo 

charged 

crank 

factory 

Bose 

sound 

Kodak 

small speaker 

Roocase 

Samsung 

bulky 

Wacom rechargeable 

solar 

Metra 

Shure 

bud 

Canon 

sound 

velcro 

leather 

kids 

styli 

oem 

Eton 

Eord 

Beats 

outside noise 

flash 

iHome 

closed 

closed 

shell 

gloves 

Sanyo 

Baofeng 

dash 

Koss 

another pair 

digital 

bass 

material 

auto 

Survivor Friendly Swede 

Lenmar 

radio reception 

Honda 

Akg 

comfortable 

optical wireless speaker 

snug 

closing 

protected 

pencil 

alkaline 

miles 

Jeep 

music 

gym 

taken 

great speaker 

protection elastic strap 

safe 

capacitive 

Energizer 

fm 

wiring 

classical 

Beats 

picture 

mini speaker 

standing 

cover 

protective 

precise 

full charge 

alert 

deck 

Klipsch 

head 

Men’s clothing 

c44 

cl07 

c75 

c49 

c52 

Clio 

cl56 

cl34 

cl33 

c24 

c9 

dress shirts dress shoes 

dress pants 

three-wolf , , . ^ 

polo shrrts 

shirt 

boots 

minimalist 

running 

athletic 

running 

sports 

shoes 

generic 

clothing 

generic 

clothing 

sleeves 

leather 

expandable 

wolf 

Polo 

Bates 

running 

Balance 

court 

dry 

same 

arms 

sole expandable waist moon 

Lauren 

Red Wing 

trail 

New 

play 

cold 

durable 

neck 

dress 

Dockers 

three 

Ralph 

leather 

barefoot 

wide 

Nike 

working 

store 

shoulders 

hrown 

iron 

power 

Beene 

good boot 

Altra 

running running shoe short 

different 

dress shirt dress shoe 

khaki 

trailer 

nice shirt 

casual boot 

running shoe series 

running 

hot 

two 

dress 

polish 

stretch waist 

hair 

Geoffrey 

dress boot 

minimalist 

feet 

games 

weather 

brand 

jacket 

hrown pair 

hidden 

man 

great shirt 

right boot 

zero drop 

usa 

light shoe 

tight 

comfort 

long sleeve toe 

ironed 

short-sleeve quality shirt motorcycle 

road 

cross training great shoe 

cool 

fine 

iron 

looking shoe 

dress pant 

magic 

white shirt 

Wings 

glove 

athletic shoe 

support 

down 

tight 

tucked 

formal 

elastic waist 

powerful 

fitted shirt 

Rangers 

run 

cross 

miles 

regular 

another pair 


Table 5: A selection of topics from Electronics and Men’s Clothing along with our labels for each topic. Top 10 words/bigrams 
from each topic are shown after subtracting the background distribution. Capital letters denote brand names (Bamboo, Wacom, 
Red Wing, etc.). 


4.6 Topic Analysis 

Next we analyze the types of topics discovered by Sceptre. As 
we recall from Section [3.2.3| each topic is associated with a 
node in Amazon’s category tree. But, just as a high-level cat¬ 
egory such as clothing can naturally be separated into finer- 
grained categories like socks, shoes, hats, pants, dresses (etc.), 
we hope that Sceptre will discover even subtler groupings 
of products that are not immediately obvious from the hand- 
labeled category hierarchy. 

Table 1^ shows some of the topics discovered by Sceptre, on 
two Amazon categories: Electronics and Men’s Clothing. We 
pruned our dictionary by using adjectives, nouns, and adjective- 
noun pairs (as determined by WordNet Q), as well as any 
words appearing in the ‘brand’ field of our crawled data. Eor 
visualization we compute the 10 highest-weight words from all 
topics, after first subtracting a ‘background’ topic containing 
the average weight across all topics. That is for each topic k we 
report the 10 highest values of 

k' 

background word distribution 

By doing so, stopwords and other language common to all top¬ 


ics appears only in the background distribution. 

The topics we obtain are closely aligned with categories from 
Amazon (e.g. electronics topic el 11, or clothing topic cl 10), 
though this is to be expected since our topic model is built on 
top of an explicit hierarchy as in Eigure However, we note 
that finer-grained ‘microcategories’ are discovered that are not 
explicitly present in the hierarchy, e.g. high-end headphones are 
distinguished from cheaper models (e89 and e99), and running 
shoes are distinguished based on distinct running styles (cl33, 
cl34, and cl56). 

We also note that brand words predominate in several top¬ 
ics, e.g. high-end headphones can be identified by words like 
‘Sennheiser’, ‘AKG’ etc. (e89), and high-end t-shirts can be 
identified by words like ‘Ralph Lauren’ and ‘Geoffrey Beene’ 
(c52). At the extreme end, a single product may have its own 
topic, e.g. the popular ‘three-wolf moon’ shirt (c49), whose re¬ 
views have already inspired academic discussion 1^ . Here the 
product’s high popularity and unique word distribution means 
that dedicating it an entire topic substantially increases the cor¬ 
pus likelihood in (eq.[^. Note that we only show a fraction of 
the full set of topics discovered by our model; other common 
camera brands (etc.) are covered among the large range of top¬ 
ics not shown here. 

Einally, while some topics are highly specific, like those 


9 











I I Sceptre 


Figure 5; Results of our user study. Users were asked to se¬ 
lect which recommendations (ours or Amazon’s) were prefer¬ 
able substitutes/complements (users could also select neither or 
both). 


referring to individual products or brands, others are more 
generic, such as clothing topics c9 and c24. Such topics tend 
to appear toward the top of the category hierarchy (see Fig. [^, 
for example the topic c9 is associated with the ‘Clothing’ 
node, whereas c24 is associated with its child, ‘Clothing: Men’, 
of which all other topics in Table are descendants. Intu¬ 
itively, these are much like ‘background’ distributions, contain¬ 
ing words that are relevant to the majority of clothing products, 
like durability, fit, warmth, color, packaging, etc. 

4.7 User Study 

Finally we perform a user study to evaluate the quality of the 
recommendations produced by Sceptre. Naturally we would not 
expect that a fully-supervised algorithm would produce predic¬ 
tions that were somehow ‘better’ than the ground-truth used to 
train it. However, we hope Sceptre may correct for some noise 
in the ground-truth, since while users may often buy multiple 
pairs of jeans together (for example) we are explicitly training 
the system to identify complementary items that would not be 
substitutable. 

We used Mechanical Turk to compare Sceptre's recommen¬ 
dations to Amazon's ‘also viewed’ and ‘also bought’ sugges¬ 
tions, for a random sample of 200 Clothing items. Human 
judges identified which recommendations they felt were ac¬ 
ceptable substitutes and complements (surfaced in a random 
order without labels; a screenshot is shown in Fig.|^). Judges 
evaluated the top recommendation, and top-5 recommendations 
separately, yielding results shown in Figure]^ 

We see here that Amazon’s ‘also viewed’ links generate 
preferable substitutes, indicating that large volumes of brows¬ 
ing data yield acceptable substitute relationships with relatively 
little noise. On the other hand. Sceptre's complement links 
are overwhelmingly preferred, suggesting that our decision to 
model complements as non-substitutes qualitatively improves 
performance. 


5 Building the product graph 

Having produced ranked lists of recommended relationships, 
our final task is to surface these recommendations to a potential 
user of our system in a meaningful way. 

While conceptually simple, comparing all products against 
all others quickly becomes impractical in a corpus with mil¬ 
lions of products. Our goal here is to rank all links, and surface 


those which have the highest likelihood under the model. That 
is, for each graph type g we would like to recommend 


reCg(i) = argmax 




where S G {T\ {t})^ 1® ^ of R products other than i itself. 

While computing the score for a single candidate edge is 
very fast {0{K) operations), on a dataset with millions of prod¬ 
ucts this already results in an unacceptable delay when ranking 
all possible recommendations. Similar to ini we implemented 
two modihcations that make this enumeration procedure feasi¬ 
ble (on the order of a few milliseconds). The first is to ignore 
obscure products by limiting the search space by some popu¬ 
larity threshold; we consider the hundred-thousand most pop¬ 
ular products per-category when generating new recommenda¬ 
tions. The second is to cull the search space using the category 
tree explicitly; e.g. when browsing for running shoes we can 
ignore, say, camera equipment and limit our search to clothing 
and shoes. Specifically, we only consider items belonging to 
the same category, its parent category, its child categories, and 
its sibling categories (in other words, its ‘immediate family’). 
It is very rare that the top-ranked recommendations belong to 
distant categories, so this has minimal impact on performance. 

Another issue is that of adding new products to the system. 
Naturally, it is not feasible to re-train the system every time a 
new product is added. However, this is thankfully not neces¬ 
sary, as the introduction of a small number of products will not 
fundamentally change the word distribution (j). Thus it is sim¬ 
ply a matter of estimating the product’s topic distribution under 
the existing model, as can be done using LDA 141 . 

When assembling our user interface (see Figs. [^and]^ we 
use the discovered topics from Section 4.6 to ‘explain’ recom¬ 
mendations to users, by selecting sentences whose language 
best explains why the recommended product was predicted. 
Specihcally, we highlight sentences whose words yield the 
largest response to F.^. 

Reproducing Sceptre. All data and code used in this pa¬ 
per, as well as the interface from Figure is available on 
the first author’s webpage: http://cseweb.ucsd.edu/ 
~ jmcauley/ 


6 Conclusion 

A useful recommender system must produce recommendations 
that not only match our preferences, but which are also rel¬ 
evant to our current topic of interest. For a user browsing a 
particular product, two useful notions of relevant recommen¬ 
dations include substitutes and complements: products that can 
be purchased instead of each other, and products that can be 
purchased in addition to each other. In this paper, our goal has 
been to learn these concepts from product features, especially 
from the text of their reviews. 

We have presented Sceptre, a model for predicting and un¬ 
derstanding relationships between linked products. We have ap¬ 
plied this to the problem of identifying substitutable and com¬ 
plementary products on a large collection of Amazon data, in- 
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Levi s Men s 501 Original Fit Jean 



(a) Men’s clothing 


Charming Women Girls Warm Thin Soft Long Scarf 
Wrap Shawl Silk Scarves 



Substitutes: Complements: 



(b) Women’s clothing 


Cduun EOS 3D Mdtk III 22.3 MP Full Frame CMOS 



Substitutes; Complements: 



(c) 


\1A1AI TMA-1 DJ Headphones without Mic. Black 



Substitutes; Complements; 




(d) mturk interface 


Figure 6: (a,b,c) Examples of recommendations produced by Sceptre', the top of each subfigure shows the query product, the 
left column shows substitutes recommended by Sceptre, and the right column shows complements, (d) Interface used to evaluate 
Sceptre on Mechanical Turk; Turkers are shown lists of items suggested hy Amazon {i.e., the ground-truth) and Sceptre and must 
identify which lists they prefer. 


eluding 144 million reviews and 237 million ground-truth rela¬ 
tionships based on browsing and co-purchasing logs. 
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Accuracy 

Error reduction 
vs. random 

Category 

Method 

Subst. 

Compl. 

Subst. 

Compl. 


Random 

60.27% 

57.70% 

0 .0% 

0 .0% 

Men’s 

LDA 

70.62% 

65.95% 

26.05% 

19.50% 

Clothing 

CT 

78.69% 

61.06% 

46.38% 

7.946% 


Sceptre 

96.69% 

94.06% 

91.67% 

85.97% 


Random 

60.35% 

56.67% 

0 .0% 

0 .0% 

Women’s 

LDA 

70.70% 

64.80% 

26.11% 

18.75% 

Clothing 

CT 

81.05% 

69.08% 

52.21% 

28.63% 


Sceptre 

95.87% 

94.14% 

89.59% 

86.47% 


Random 

- 

50.18% 

- 

0 .0% 

Music 

LDA 

- 

52.39% 

- 

4.428% 

CT 

- 

57.02% 

- 

13.71% 


Sceptre 

- 

90.43% 

- 

80.78% 


Random 

- 

51.22% 

- 

0 .0% 

Movies 

LDA 

- 

54.26% 

- 

6.235% 

CT 

- 

66.34% 

- 

30.99% 


Sceptre 

- 

85.57% 

- 

70.42% 


Random 

69.98% 

55.67% 

0 .0% 

0 .0% 

Electronics 

LDA 

89.90% 

61.90% 

66.35% 

14.06% 

CT 

87.26% 

60.18% 

57.57% 

10.17% 


Sceptre 

95.70% 

88.80% 

85.69% 

74.74% 


Random 

69.93% 

55.35% 

0 .0% 

0 .0% 

Books 

LDA 

89.91% 

60.59% 

66.47% 

11.75% 

CT 

87.80% 

66.28% 

59.42% 

24.49% 


Sceptre 

93.76% 

89.86% 

79.25% 

77.29% 


random 

62.93% 

52.47% 

0 .0% 

0 .0% 

Baby 

LDA 

75.86% 

54.73% 

34.89% 

4.75% 

Clothes 

CT 

79.31% 

64.56% 

44.18% 

25.43% 


Sceptre 

92.18% 

93.65% 

78.91% 

86.65% 

Average 

Sceptre 

94.83% 

90.23% 

85.02% 

80.33% 


Table 3: Link prediction accuracy for substitute and comple¬ 
ment links (the former are not available for the majority of Mu¬ 
sic/Movies products in our dataset). Absolute performance is 
shown at left, reduction in error vs. random classification at 
right. 


Category 


Accuracy 
Subst. Compl. 


Error reduction 
vs. random 
Subst. Compl. 


Electronics, cold-start 91.28% 93.22% 70.95% 84.71% 
Books, cold-start 96.76% 93.67% 89.22% 85.81% 


Table 4: Link prediction accuracy using cold-start data (manu¬ 
facturer’s and editorial descriptions). 
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